Lehnen reports that these measures were often criticized as vague and
unreflective of performance.  His conclusion is that for policymakers there are
few good measures of performance and that anecdotal information was often as
credible as national statistics.  (This observation supports Hersh's call for
case studies.)

Further, Lehnen advocates that NCES determine what kinds of measures should be
collected via a public process that involves interest groups, policymakers, and
education professionals.  "Without such information, the Nation's policymakers
cannot effectively evaluate the Nation's schools and develop programs to remedy
deficiencies."

Walberg urges that new tests and testing procedures be developed that take
advantage of the technology of the moment (computers) and the concept of
"tailored-testing" which adapts test items to students. Under this approach,
the most discriminating test items would be assigned to each student so that 15
items would yield scores as reliable as 90 batched items suited to the average
student.  Smith and others believe that the use of computers should enable one
to assess the higher order skills that go beyond the basic skills tested by
NAEP, HSB, and IEA.

The need for definition and measurement of critical thinking and higher order
skills is a recurring theme. Buccino questions whether current tests measure
higher order skills.  Like Smith, Scott-Jones urges the development of
appropriate test items.  (See also Thomas, Bishop, B. Turnbull.)

Eubanks argues that tests to assess higher order skills do exist.  He describes
the Degrees of Reading Power (DRP) developed by the College Board and the Word
Test being validated by Carver to evaluate reading comprehension as opposed to
merely sounding out words, and the Lauton Formal Operations Test to evaluate the
development of thinking skills as opposed to rote memorization.

Wilkinson calls for NCES to play a strong role in developing classroom learning,
development, and achievement indicators.  In addition to group and individually
administered tests, she calls for tools to assess social and communicative
achievement, differences in achievement due to cultural and situational factors
and "direct observation of students naturally occurring behavior in a variety of
classroom situations," asserting that "this knowledge mediates both the teaching
and learning of academic subject matter in classrooms" by children.

In contrast to the majority of writers who favor large scale data collection,
Harsh takes a different approach to the need for outcome information. While
supporting collection of data on standardized tests, Hersh argues that the only
way to affectively measure higher order learning skills involving analytical
learning is to conduct hundreds of indepth case studies. He argues that the
result of these case studies will illuminate the meaning of organizational
efficacy for a particular school.  Hersh argues that the case study approach,
used as an assessment of organizational efficacy, would inform us of what school
conditions working together seem to explain student achievement as wall as
student and faculty satisfaction.

ACHIEVEMENT

There does not appear to be any quick fix to the problems associated with
definition and measurement of achievement. But while solutions are being
sought, measurement using the available tools must continue. SAT and ACT, NAEP,

43hools and pupils; and how teacher attributes
